Theoretical Results on De-Anonymization via Linkage Attacks
نویسنده
چکیده
Consider a database D with records containing history of individuals’ transactions, that has been de-identified, i.e., the variables that uniquely associate records with individuals have been removed from the data. An adversary de-anonymizes D via a linkage attack if using some auxiliary information about a certain individual in the database, it can determine which record of D corresponds to such individual. One example of this is given in the article Robust De-anonymization of Large Sparse Datasets, by Narayanan and Shmatikov [19], which shows that an anonymized database containing records with ratings of different movies rented by customers of Netflix, could in fact be de-anonymized using very little auxiliary information, even with errors. Besides the heuristic de-anonymization of the Netflix database, Narayanan and Shmatikov provide interesting theoretical results about database de-anonymization that an adversary can produce under general conditions. In this article we revisit these theoretical results, and work them further. Our first contribution is to exhibit different simple cases in which the algorithm Scoreboard, meant to produce the theoretical de-anonymization in [19], fails to do so. By requiring 1 − sim to be a pseudo-metric, and that the algorithm producing the de-anonymization outputs a record with minimum support among the candidates, we obtain and prove deanonymization results similar to those described in [19]. We then consider a new hypothesis, motivated by the fact (observed in heuristic de-anonymizations) that when the auxiliary information contains values corresponding to rare attributes, the de-anonymization achieved is stronger. We formalize this using the notion on long tail [4], and give new theorems expressing the level of de-anonymization in terms of the parameters of the tail of the database D. The improvement in the deanonymization is reflected in the fact that when at least one value in the auxiliary information corresponds to a rare attribute of D, the size of auxiliary information could be reduced in about 50%, provided that D has a long tail. We then explore a microdata file from the Joint Canada/United States Survey of Health 2004 [22], where the records reflect the answers of the survey respondents. While many of the variables are related to health issues, some other variables a related to characteristics that individuals may disclose easily, such as physical activities (sports) or demographic characteristics. We perform an experiment with this microdata file and show that using only some non-sensitive attribute values it is possible, with a significant probability, to link those values to the corresponding full record.
منابع مشابه
Quantification of De-anonymization Risks in Social Networks
The risks of publishing privacy-sensitive data have received considerable attention recently. Several deanonymization attacks have been proposed to re-identify individuals even if data anonymization techniques were applied. However, there is no theoretical quantification for relating the data utility that is preserved by the anonymization techniques and the data vulnerability against de-anonymi...
متن کاملHow to Quantify Graph De-anonymization Risks
An increasing amount of data are becoming publicly available over the Internet. These data are released after applying some anonymization techniques. Recently, researchers have paid significant attention to analyzing the risks of publishing privacy-sensitive data. Even if data anonymization techniques were applied to protect privacy-sensitive data, several de-anonymization attacks have been pro...
متن کاملSecGraph: A Uniform and Open-source Evaluation System for Graph Data Anonymization and De-anonymization
In this paper, we analyze and systematize the state-ofthe-art graph data privacy and utility techniques. Specifically, we propose and develop SecGraph (available at [1]), a uniform and open-source Secure Graph data sharing/publishing system. In SecGraph, we systematically study, implement, and evaluate 11 graph data anonymization algorithms, 19 data utility metrics, and 15 modern Structure-base...
متن کاملOn Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge
In this paper, we conduct the first comprehensive quantification on the perfect de-anonymizability and partial deanonymizability of real world social networks with seed information in general scenarios, where a social network can follow an arbitrary distribution model. This quantification provides the theoretical foundation for existing structure based de-anonymization attacks (e.g., [1][2][3])...
متن کاملSocial Network De-anonymization: More Adversarial Knowledge, More Users Re-Identified?
Following the trend of data trading and data publishing, many online social networks have enabled potentially sensitive data to be exchanged or shared on the web. As a result, users’ privacy could be exposed to malicious third parties since they are extremely vulnerable to de-anonymization attacks, i.e., the attacker links the anonymous nodes in the social network to their real identities with ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Trans. Data Privacy
دوره 5 شماره
صفحات -
تاریخ انتشار 2012